A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds Metric

نویسندگان

  • Nicholas D. Pattengale
  • Bernard M. E. Moret
چکیده

The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day’s algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1+ε) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficiently Computing the Robinson-Foulds Metric

The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, in sublinear time and with high probability, a (1 + epsilon) approxima...

متن کامل

Discriminative Measures for Comparison of Phylogenetic Trees

In this paper we introduce and study three new measures for efficient discriminative comparison of phylogenetic trees. The NNI navigation dissimilarity dnav counts the steps along a “combing” of the Nearest Neighbor Interchange (NNI) graph of binary hierarchies, providing an efficient approximation to the (NP-hard) NNI distance in terms of “edit length”. At the same time, a closed form formula ...

متن کامل

ARTUR CZUMAJ AND CHRISTIAN SOHLER Title: ESTIMATING THE WEIGHT OF METRIC MINIMUM SPANNING TREES IN SUBLINEAR TIME

In this paper we present a sublinear-time (1+ε)-approximation randomized algorithm to estimate the weight of the minimum spanning tree of an n-point metric space. The running time of the algorithm is Õ(n/εO(1)). Since the full description of an n-point metric space is of size Θ(n2), the complexity of our algorithm is sublinear with respect to the input size. Our algorithm is almost optimal as i...

متن کامل

The Generalized Robinson-Foulds Metric

The Robinson-Foulds (RF) metric is arguably the most widely used measure of phylogenetic tree similarity, despite its well-known shortcomings: For example, moving a single taxon in a tree can result in a tree that has maximum distance to the original one; but the two trees are identical if we remove the single taxon. To this end, we propose a natural extension of the RF metric that does not sim...

متن کامل

Facility Location in Sublinear Time

In this paper we present a randomized constant factor approximation algorithm for the problem of computing the optimal cost of the metric Minimum Facility Location problem, in the case of uniform costs and uniform demands, and in which every point can open a facility. By exploiting the fact that we are approximating the optimal cost without computing an actual solution, we give the first algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006